# Get started with TensorFlow Lite

Using a TensorFlow Lite model in your mobile app requires multiple
considerations: you must choose a pre-trained or custom model, convert the model
to a TensorFLow Lite format, and finally, integrate the model in your app.

## 1. Choose a model

Depending on the use case, you can choose one of the popular open-sourced models,
such as *InceptionV3* or *MobileNets*, and re-train these models with a custom
data set or even build your own custom model.

### Use a pre-trained model

[MobileNets](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html)
is a family of mobile-first computer vision models for TensorFlow designed to
effectively maximize accuracy, while taking into consideration the restricted
resources for on-device or embedded applications. MobileNets are small,
low-latency, low-power models parameterized to meet the resource constraints for
a variety of uses. They can be used for classification, detection, embeddings, and
segmentation—similar to other popular large scale models, such as
[Inception](https://arxiv.org/pdf/1602.07261.pdf). Google provides 16 pre-trained
[ImageNet](http://www.image-net.org/challenges/LSVRC/) classification checkpoints
for MobileNets that can be used in mobile projects of all sizes.

[Inception-v3](https://arxiv.org/abs/1512.00567) is an image recognition model
that achieves fairly high accuracy recognizing general objects with 1000 classes,
for example, "Zebra", "Dalmatian", and "Dishwasher". The model extracts general
features from input images using a convolutional neural network and classifies
them based on those features with fully-connected and softmax layers.

[On Device Smart Reply](https://research.googleblog.com/2017/02/on-device-machine-intelligence.html)
is an on-device model that provides one-touch replies for incoming text messages
by suggesting contextually relevant messages. The model is built specifically for
memory constrained devices, such as watches and phones, and has been successfully
used in Smart Replies on Android Wear. Currently, this model is Android-specific.

These pre-trained models are [available for download](hosted_models.md).

### Re-train Inception-V3 or MobileNet for a custom data set

These pre-trained models were trained on the *ImageNet* data set which contains
1000 predefined classes. If these classes are not sufficient for your use case,
the model will need to be re-trained. This technique is called
*transfer learning* and starts with a model that has been already trained on a
problem, then retrains the model on a similar problem. Deep learning from
scratch can take days, but transfer learning is fairly quick. In order to do
this, you need to generate a custom data set labeled with the relevant classes.

The [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/)
codelab walks through the re-training process step-by-step. The code supports
both floating point and quantized inference.

### Train a custom model

A developer may choose to train a custom model using Tensorflow (see the
[TensorFlow tutorials](https://www.tensorflow.org/tutorials/) for examples of building and training
models). If you have already written a model, the first step is to export this
to a `tf.GraphDef` file. This is required because some formats do not store the
model structure outside the code, and we must communicate with other parts of
the framework. See
[Exporting the Inference Graph](https://www.tensorflow.org/tutorials/keras/save_and_restore_models#save_the_entire_model)
to create file for the custom model.

TensorFlow Lite currently supports a subset of TensorFlow operators. Refer to
the [TensorFlow Lite & TensorFlow Compatibility Guide](ops_compatibility.md)
for supported operators and their usage. This set of operators will continue to
grow in future Tensorflow Lite releases.

## 2. Convert the model format

The [TensorFlow Lite Converter](../convert/index.md) accepts the following file
formats:

*   `SavedModel` — A `GraphDef` and checkpoint with a signature that labels
    input and output arguments to a model. See the documentation for converting
    SavedModels using [Python](../convert/python_api.md#basic_savedmodel) or using
    the [command line](../convert/cmdline_examples.md#savedmodel).
*   `tf.keras` - A HDF5 file containing a model with weights and input and
    output arguments generated by `tf.Keras`. See the documentation for
    converting HDF5 models using
    [Python](../convert/python_api.md#basic_keras_file) or using the
    [command line](../convert/cmdline_examples.md#keras).
*   `frozen tf.GraphDef` — A subclass of `tf.GraphDef` that does not contain
    variables. A `GraphDef` can be converted to a `frozen GraphDef` by taking a
    checkpoint and a `GraphDef`, and converting each variable into a constant
    using the value retrieved from the checkpoint. Instructions on converting a
    `tf.GraphDef` to a TensorFlow Lite model are described in the next
    subsection.

### Converting a tf.GraphDef

TensorFlow models may be saved as a .pb or .pbtxt `tf.GraphDef` file. In order
to convert the `tf.GraphDef` file to TensorFlow Lite, the model must first be
frozen. This process involves several file formats including the `frozen
GraphDef`:

*   `tf.GraphDef` (.pb or .pbtxt) — A protobuf that represents the TensorFlow
    training or computation graph. It contains operators, tensors, and variables
    definitions.
*   *checkpoint* (.ckpt) — Serialized variables from a TensorFlow graph. Since
    this does not contain a graph structure, it cannot be interpreted by itself.
*   *TensorFlow Lite model* (.tflite) — A serialized
    [FlatBuffer](https://google.github.io/flatbuffers/) that contains TensorFlow
    Lite operators and tensors for the TensorFlow Lite interpreter.

You must have checkpoints that contain trained weights. The `tf.GraphDef` file
only contains the structure of the graph. The process of merging the checkpoint
values with the graph structure is called *freezing the graph*.

`tf.GraphDef` and checkpoint files for MobileNet models are available
[here](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md).

To freeze the graph, use the following command (changing the arguments):

```
freeze_graph --input_graph=/tmp/mobilenet_v1_224.pb \
  --input_checkpoint=/tmp/checkpoints/mobilenet-10202.ckpt \
  --input_binary=true \
  --output_graph=/tmp/frozen_mobilenet_v1_224.pb \
  --output_node_names=MobileNetV1/Predictions/Reshape_1
```

Set the `input_binary` flag to `True` when reading a binary protobuf, a `.pb`
file. Set to `False` for a `.pbtxt` file.

Set `input_graph` and `input_checkpoint` to the respective filenames. The
`output_node_names` may not be obvious outside of the code that built the model.
The easiest way to find them is to visualize the graph, either with
[TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard) or
`graphviz`.

The frozen `GraphDef` is now ready for conversion to the `FlatBuffer` format
(.tflite) for use on Android or iOS devices. For Android, the TensorFlow Lite
Converter tool supports both float and quantized models. To convert the frozen
`GraphDef` to the .tflite format use a command similar to the following:

```
tflite_convert \
  --output_file=/tmp/mobilenet_v1_1.0_224.tflite \
  --graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
  --input_arrays=input \
  --output_arrays=MobilenetV1/Predictions/Reshape_1
```

The
[frozen_graph.pb](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz)
file used here is available for download. Setting the `input_array` and
`output_array` arguments is not straightforward. The easiest way to find these
values is to explore the graph using
[TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard). Reuse
the arguments for specifying the output nodes for inference in the
`freeze_graph` step.

### Full converter reference

The [TensorFlow Lite Converter](../convert/index.md) can be
[Python](../convert/python_api.md) or from the
[command line](../convert/cmdline_examples.md). This allows you to integrate the
conversion step into the model design workflow, ensuring the model is easy to
convert to a mobile inference graph.

### Ops compatibility

Refer to the [ops compatibility guide](ops_compatibility.md) for
troubleshooting help, and if that doesn't help, please
[file an issue](https://github.com/tensorflow/tensorflow/issues).

### Graph Visualization tool

The [development repo](https://github.com/tensorflow/tensorflow) contains a tool
to visualize TensorFlow Lite models after conversion. To build the
[visualize.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/visualize.py)
tool:

```sh
bazel run tensorflow/lite/tools:visualize -- model.tflite model_viz.html
```

This generates an interactive HTML page listing subgraphs, operations, and a
graph visualization.

## 3. Use the TensorFlow Lite model for inference in a mobile app

After completing the prior steps, you should now have a `.tflite` model file.

### Android

Since Android apps are written in Java and the core TensorFlow library is in C++,
a JNI library is provided as an interface. This is only meant for inference—it
provides the ability to load a graph, set up inputs, and run the model to
calculate outputs.

The open source Android demo app uses the JNI interface and is available
[on GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/java/demo/app).
You can also download a
[prebuilt APK](http://download.tensorflow.org/deps/tflite/TfLiteCameraDemo.apk).
See the <a href="./android.md">Android demo</a> guide for details.

The <a href="./android.md">Android mobile</a> guide has instructions for
installing TensorFlow on Android and setting up `bazel` and Android Studio.

### iOS

To integrate a TensorFlow model in an iOS app, see the
[TensorFlow Lite for iOS](ios.md) guide and <a href="./ios.md">iOS demo</a>
guide.

#### Core ML support

Core ML is a machine learning framework used in Apple products. In addition to
using Tensorflow Lite models directly in your applications, you can convert
trained Tensorflow models to the
[CoreML](https://developer.apple.com/machine-learning/) format for use on Apple
devices. To use the converter, refer to the
[Tensorflow-CoreML converter documentation](https://github.com/tf-coreml/tf-coreml).

### ARM32 and ARM64 Linux

Compile Tensorflow Lite for a Raspberry Pi by following the
[RPi build instructions](build_rpi.md) Compile Tensorflow Lite for a generic aarch64
board such as Odroid C2, Pine64, NanoPi, and others by following the
[ARM64 Linux build instructions](build_arm64.md) This compiles a static
library file (`.a`) used to build your app. There are plans for Python bindings
and a demo app.

## 4. Optimize your model (optional)

There are two options. If you plan to run on CPU, we recommend that you quantize
your weights and activation tensors. If the hardware is available, another
option is to run on GPU for massively parallelizable workloads.

### Quantization
Compress your model size by lowering the precision of the parameters (i.e.
neural network weights) from their training-time 32-bit floating-point
representations into much smaller and efficient 8-bit integer ones.

This will execute the heaviest computations fast in lower precision, but the
most sensitive ones with higher precision, thus typically resulting in little to
no final accuracy losses for the task, yet a significant speed-up over pure
floating-point execution.

The post-training quantization technique is integrated into the TensorFlow Lite
conversion tool. Getting started is easy: after building your TensorFlow model,
simply enable the ‘post_training_quantize’ flag in the TensorFlow Lite
conversion tool. Assuming that the saved model is stored in saved_model_dir, the
quantized tflite flatbuffer can be generated in command line:

```
converter=tf.contrib.lite.TocoConverter.from_saved_model(saved_model_dir)
converter.post_training_quantize=True
tflite_quantized_model=converter.convert()
open(“quantized_model.tflite”, “wb”).write(tflite_quantized_model)
```

Read the full documentation [here](../performance/post_training_quantization.md)
and see a tutorial
[here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tutorials/post_training_quant.ipynb).

### GPU
Run on GPU GPUs are designed to have high throughput for massively
parallelizable workloads. Thus, they are well-suited for deep neural nets, which
consist of a huge number of operators, each working on some input tensor(s) that
can be easily divided into smaller workloads and carried out in parallel,
typically resulting in lower latency.

Another benefit with GPU inference is its power efficiency. GPUs carry out the
computations in a very efficient and optimized manner, so that they consume less
power and generate less heat than when the same task is run on CPUs.

Read the tutorial [here](../performance/gpu.md) and full documentation [here](../performance/gpu_advanced.md).
